Skip to content

Conversation

@manuelkiessling
Copy link
Member

@manuelkiessling manuelkiessling commented Feb 10, 2026

Closes #90.

PhotoBuilder — AI Image Generation for Content Pages

Summary

New vertical src/PhotoBuilder/ that generates AI-driven images matching the visual tone and content of a web page. Users launch PhotoBuilder from the Content Editor, receive AI-generated image prompts based on the page HTML, review/edit prompts, generate images, upload them to the S3 media store, and embed them back into the page — all within one integrated workflow.

Architecture

Vertical slice in src/PhotoBuilder/ following the established Domain / Infrastructure / Presentation layering, communicating with other verticals exclusively via facades.

graph LR
    PhotoBuilder -->|"readWorkspaceFile (dist HTML)"| WorkspaceMgmt
    PhotoBuilder -->|"getProjectInfo (LLM config, S3)"| ProjectMgmt
    PhotoBuilder -->|"uploadAsset (S3)"| RemoteContentAssets
    PhotoBuilder -->|"findAvailableFileNames (manifest polling)"| RemoteContentAssets
    PhotoBuilder -->|"getAccountInfoByEmail"| Account
    ChatBasedContentEditor -.->|"CTA link in dist files"| PhotoBuilder
Loading

Facade dependencies (documented in docs/vertical-wiring.md):

  • PhotoBuilder → WorkspaceMgmt: readWorkspaceFile (page HTML for prompt context)
  • PhotoBuilder → ProjectMgmt: getProjectInfo (LLM API keys, S3 credentials, provider config)
  • PhotoBuilder → RemoteContentAssets: uploadAsset, findAvailableFileNames (S3 upload + CDN manifest polling)
  • PhotoBuilder → Account: getAccountInfoByEmail (access validation)

Vertical Structure

src/PhotoBuilder/
├── Domain/
│   ├── Dto/
│   │   └── ImagePromptResultDto.php
│   ├── Entity/
│   │   ├── PhotoSession.php
│   │   └── PhotoImage.php
│   ├── Enum/
│   │   ├── PhotoSessionStatus.php
│   │   └── PhotoImageStatus.php
│   └── Service/
│       └── PhotoBuilderService.php
├── Infrastructure/
│   ├── Adapter/
│   │   ├── PromptGeneratorInterface.php
│   │   ├── OpenAiPromptGenerator.php
│   │   ├── ImageGeneratorInterface.php
│   │   ├── ImageGeneratorFactory.php
│   │   ├── OpenAiImageGenerator.php
│   │   ├── GeminiImageGenerator.php
│   │   ├── ImagePromptAgent.php
│   │   └── PatchedGemini.php
│   ├── Handler/
│   │   ├── GenerateImagePromptsHandler.php
│   │   └── GenerateImageHandler.php
│   ├── Message/
│   │   ├── GenerateImagePromptsMessage.php
│   │   └── GenerateImageMessage.php
│   └── Storage/
│       └── GeneratedImageStorage.php
├── Presentation/
│   ├── Controller/
│   │   └── PhotoBuilderController.php
│   └── Resources/
│       ├── assets/controllers/
│       │   ├── photo_builder_controller.ts
│       │   └── photo_image_controller.ts
│       └── templates/
│           └── photo_builder.twig
└── TestHarness/
    ├── FakeImageGenerator.php
    └── FakePromptGenerator.php

Domain Layer

Entities

PhotoSession — tracks one photo generation session per page:

  • id (UUID), workspaceId, conversationId, pagePath
  • systemPrompt, userPrompt (LLM prompt context)
  • status (enum: generating_prompts, prompts_ready, generating_images, images_ready, failed)
  • createdAt

PhotoImage — tracks each generated image:

  • id (UUID), session (ManyToOne → PhotoSession), position
  • prompt, suggestedFileName (LLM-generated)
  • status (enum: pending, generating, completed, failed)
  • storagePath (relative path in var/photo-builder/)
  • uploadedToMediaStoreAt, uploadedFileName (S3 upload tracking)
  • errorMessage

Service

PhotoBuilderService orchestrates session lifecycle: creates sessions with IMAGE_COUNT empty image slots, updates prompts from LLM output, coordinates status transitions, and respects "keep" flags during prompt regeneration.

Infrastructure Layer

Multi-Provider LLM Support (OpenAI + Google Gemini)

The plan originally assumed OpenAI only. The implementation introduces a two-tier, multi-provider LLM configuration:

  • Content Editing (OpenAI-only): existing chat-based content editor
  • PhotoBuilder (OpenAI or Google Gemini): configurable per project, with fallback to content editing settings

Prompt generation uses a NeuronAI Agent with a deliver_image_prompt tool — the LLM calls this tool once per image, delivering structured {prompt, file_name} pairs. This tool-based approach avoids fragile JSON parsing. The agent supports both OpenAI and Gemini providers, parameterized via ImagePromptAgent.

Image generation uses direct HTTP calls:

  • OpenAiImageGenerator: OpenAI Images API (gpt-image-1, b64_json response format)
  • GeminiImageGenerator: Google Gemini API with native image generation (supports lo-res 1024px and hi-res 2048px modes)
  • ImageGeneratorFactory: selects the appropriate generator based on project configuration

PatchedGemini provider: NeuronAI's built-in Gemini provider only checks parts[0] for function calls, but Gemini 3 models return text/thought parts before function call parts. PatchedGemini scans all parts and reindexes correctly.

Async Processing (Symfony Messenger)

Two message/handler pairs, dispatched through the immediate transport:

  • GenerateImagePromptsHandler: loads session, reads page HTML via facade, runs prompt agent, updates images with prompts + filenames, dispatches individual image generation messages (respects "keep" flags)
  • GenerateImageHandler: generates a single image via the selected provider, saves to disk, updates entity, checks if all session images are done

Messenger consumer scaled to 5 replicas (docker-compose.yml) for parallel image processing.

Image Storage

GeneratedImageStorage: filesystem adapter at var/photo-builder/{sessionId}/{position}.png with save/read/getAbsolutePath methods.

Presentation Layer

Controller

PhotoBuilderController with routes for:

  • Page rendering (GET /photo-builder/{workspaceId})
  • Session management (create, poll status, regenerate prompts)
  • Image operations (regenerate single image, update prompt, serve file, upload to S3)
  • Manifest availability check (polls CDN manifests before redirect)

Access control via #[IsGranted('ROLE_USER')] with workspace/project ownership verification.

Frontend (Stimulus + Twig)

Two-controller architecture:

  • photo_builder_controller.ts — page orchestrator managing session lifecycle, polling, global state, and inter-controller coordination
  • photo_image_controller.ts — per-card controller for individual image state, prompt editing, and UI feedback

Key UX features implemented beyond the original plan:

  • Lo-res/hi-res resolution toggle (Google Gemini only) — fast iteration vs. higher quality
  • Upload feedback — per-image spinner + "Uploaded" checkmark on single-image upload, overlay + "Uploading images, please wait..." on bulk embed
  • Manifest availability polling — after S3 upload, polls CDN manifests (3s intervals, 90s timeout) before redirecting to content editor, preventing broken image references
  • User prompt preservation — local edits in the "Additional image style instructions" textarea are not overwritten by poll responses
  • Cache-busting — timestamp suffix on image URLs after regeneration to defeat browser cache
  • Prompt language — generated prompts match the page's locale
  • "Keep prompt" during regeneration — users can protect individual prompts from being overwritten when regenerating all prompts
  • Parent-to-child event dispatch — DOM events bubble upward only; parent dispatches events directly on child elements (pattern documented in docs/frontendbook.md)

Template

photo_builder.twig — responsive image grid with loading overlay, user prompt section, per-image cards (preview, prompt textarea, keep checkbox, regenerate/upload buttons), and embed CTA. Uses etfswui-* styleguide classes throughout.

Content Editor Integration

  • PhotoBuilder CTA in dist_files_controller.ts: camera icon next to each page file that navigates to PhotoBuilder
  • Prefilled chat message: after embedding, navigates back to content editor with ?prefill=Embed images a.jpg, b.jpg into page x.html query param, pre-filling the instruction textarea
  • Chat-based content editor: reads prefillMessage from URL and populates the input

Project Settings: Hierarchical LLM Configuration

The existing single llmApiKey/llmModelProvider fields were renamed to contentEditingLlmApiKey/contentEditingLlmModelProvider (scoped to content editing). New optional photoBuilderLlm* fields were added with automatic fallback to content editing settings.

Project settings UI (project_form.twig) extended with:

  • Option A: "Use same settings as Content Editing" (default, one-click)
  • Option B: "Use dedicated provider" with provider radio (OpenAI/Google), API key input, and verification button
  • LLM key verification controller updated to resolve provider from the nearest fieldset

Model selection:

  • OpenAI: gpt-image-1 for image generation
  • Google: gemini-3-pro-image-preview for image generation, gemini-3-flash-preview for prompt generation

TestHarness

src/PhotoBuilder/TestHarness/ provides fake adapters for local development:

  • FakePromptGenerator: returns canned prompts without calling an LLM
  • FakeImageGenerator: generates placeholder images without API calls
  • Toggled via .env flags: PHOTO_BUILDER_SIMULATE_IMAGE_PROMPT_GENERATION, PHOTO_BUILDER_SIMULATE_IMAGE_GENERATION

Database Migrations

4 migrations:

  • Version20260210112717 — create photo_sessions and photo_images tables
  • Version20260211081136 — add uploaded_to_media_store_at to photo_images
  • Version20260211082223 — add uploaded_file_name to photo_images
  • Version20260211110000 — rename LLM fields to scoped names, add PhotoBuilder-specific LLM columns

Cross-Cutting Concerns

  • CSRF protection: token generated in Twig, passed to Stimulus as value, validated on all POST endpoints
  • Access control: #[IsGranted('ROLE_USER')] + workspace/project ownership verification
  • DateAndTimeService: used for all entity timestamps (no new DateTimeImmutable())
  • LLM wire logging: prompt agent supports wire logger for debuggability
  • Translations: full EN + DE coverage for all PhotoBuilder UI strings
  • Language switcher: preserves page and conversationId query params when switching locale

Documentation

  • docs/vertical-wiring.md updated with PhotoBuilder facade dependencies
  • docs/frontendbook.md updated with parent-to-child event dispatch pattern
  • docs/llm-usage-book.md added — documents all LLM concerns and provider configuration

Test Coverage

PHP unit tests (in tests/Unit/PhotoBuilder/):

  • PhotoSession, PhotoImage entities
  • PhotoBuilderService
  • GeneratedImageStorage
  • OpenAiImageGenerator, GeminiImageGenerator
  • PatchedGemini provider
  • RemoteContentAssetsFacade (findAvailableFileNames)

Frontend tests (Vitest, in tests/frontend/unit/PhotoBuilder/):

  • photo_builder_controller.test.ts — session lifecycle, polling, prompt regeneration, upload flow, manifest polling, resolution toggle
  • photo_image_controller.test.ts — state updates, prompt editing, keep checkbox, button state management, cache-busting
  • Plus additional tests in dist_files_controller.test.ts (PhotoBuilder CTA) and chat_based_content_editor_controller.test.ts (prefill message)

Stats: 78 files changed, ~8,900 lines added, ~340 lines removed.

Made with Cursor

… unit tests

New vertical for AI image generation matching web page content:
- Domain: PhotoSession/PhotoImage entities, enums, PhotoBuilderService with IMAGE_COUNT constant
- Infrastructure: PromptGenerator (NeuronAI agent with deliver_image_prompt tool),
  ImageGenerator (OpenAI Images API), GeneratedImageStorage, Messenger messages/handlers
- Tests: 45 unit tests covering entities, service logic, storage, and image generator

Co-authored-by: Cursor <cursoragent@cursor.com>
@manuelkiessling manuelkiessling marked this pull request as draft February 10, 2026 10:57
@manuelkiessling manuelkiessling self-assigned this Feb 10, 2026
@manuelkiessling manuelkiessling added the enhancement New feature or request label Feb 10, 2026
manuelkiessling and others added 26 commits February 10, 2026 12:19
…translations

- PhotoBuilderController with all API endpoints (create session, poll, regenerate,
  serve image, upload to media store)
- Twig template with loading state, user prompt, responsive image grid, media store sidebar
- Two Stimulus controllers: photo_builder_controller.ts (orchestrator) and
  photo_image_controller.ts (per-card state management)
- EN+DE translations for all PhotoBuilder UI strings
- ImagePromptResultDto to replace associative arrays at boundaries
- Registered new controllers in bootstrap.ts and asset_mapper.yaml
- Service wiring in services.yaml, Twig namespace in twig.yaml
- All quality checks pass (PHPStan, ESLint, tsc, Prettier, PHP CS Fixer)

Co-authored-by: Cursor <cursoragent@cursor.com>
…s for PhotoBuilder

- Wire PhotoBuilder CTA (camera icon) into dist_files_controller for each page
- Add prefillMessage support to chat-based-content-editor controller for the
  "Embed generated images into content page" flow
- Register PhotoBuilder entities in doctrine.yaml and generate migration
  for photo_sessions and photo_images tables
- Add Vitest tests for photo_builder_controller (23 tests) and
  photo_image_controller (25 tests)
- Add tests for PhotoBuilder CTA in dist_files_controller (5 tests) and
  prefillMessage in chat_based_content_editor_controller (3 tests)

Co-authored-by: Cursor <cursoragent@cursor.com>
…ms, image serving

- Replace invalid placeholder strings (___SESSION_ID___) in Twig template
  with dummy UUIDs that satisfy Symfony route parameter requirements
- Use output_format instead of response_format for gpt-image-1 API
  (response_format is a dall-e-2/dall-e-3 parameter)
- Generate image URLs via Symfony router to include locale prefix,
  fixing broken image display due to missing /{_locale}/ in path
- Update vertical-wiring.md with PhotoBuilder facade dependencies
- Update corresponding unit and frontend tests

Co-authored-by: Cursor <cursoragent@cursor.com>
…dback, TestHarness

- Use etfswui-* styleguide classes on PhotoBuilder page (buttons, cards, forms)
- Add cursor-pointer to all CTAs via styleguide button classes
- Extract Remote Assets sidebar to @common.presentation/_remote_asset_browser_sidebar.html.twig
- Include shared partial in chat_based_content_editor and photo_builder
- Show 'Upload has been finished' banner on PhotoBuilder when upload completes (auto-hide 5s)
- Add PhotoBuilder TestHarness: FakePromptGenerator, FakeImageGenerator, env toggles
- PHOTO_BUILDER_SIMULATE_IMAGE_PROMPT_GENERATION and PHOTO_BUILDER_SIMULATE_IMAGE_GENERATION in .env
- Fix OpenAI image API (output_format for gpt-image-1), poll image URLs, Stimulus action wiring
- IMAGE_COUNT=1 for faster testing; docs/frontendbook.md and vertical-wiring.md updates

Co-authored-by: Cursor <cursoragent@cursor.com>
…er query params

- Show 'Upload has been finished' banner when image-card upload succeeds (not only sidebar)
- Regenerate prompts: overlay + spinner, clear unprotected prompt textareas on start
- Hide overlay when poll returns generating state; add regenerating_prompts translation
- Language switcher: preserve query string (page, conversationId) when switching locale on photo builder

Co-authored-by: Cursor <cursoragent@cursor.com>
- Add uploadedToMediaStoreAt to PhotoImage to track S3 uploads
- Persist upload state in uploadToMediaStore endpoint; idempotent when already uploaded
- Include uploadedToMediaStore in poll response
- Change embedIntoPage to async: upload non-uploaded images first, show
  'Uploading images, please wait...' overlay, then navigate on success
- Add translations for uploading_images (EN/DE)
- Reset uploadedToMediaStoreAt when image is regenerated

Co-authored-by: Cursor <cursoragent@cursor.com>
…prompt

- Add uploadedFileName to PhotoImage for hash-prefixed S3 names in embed message
- Pass keepImageIds from regenerate prompts to handler; skip regenerating kept images
- Only dispatch image generation for changed prompts, not kept ones
- Clear uploaded state when prompt is regenerated

Co-authored-by: Cursor <cursoragent@cursor.com>
Dispatch clearPromptIfNotKept event on each child card element instead
of the parent — DOM events bubble upward, so dispatching on the parent
never reached child controllers. Also show "Generating..." text with
pulse animation immediately on non-kept prompts, disable buttons during
regeneration, and document the parent-to-child event pattern in
frontendbook.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ider configuration

Introduce a two-tier LLM configuration system: content editing (OpenAI-only)
and PhotoBuilder (OpenAI or Google Gemini). Projects can either reuse content
editing settings for image generation or configure a dedicated provider/key.

- Rename llmApiKey/llmModelProvider to contentEditing* scope across entity,
  DTOs, facades, controllers, templates, and tests
- Add nullable photoBuilder* LLM fields with fallback to content editing
- Extend LlmModelProvider enum with Google case and model selection methods
- Extend LlmModelName enum with gpt-image-1, gemini-3-pro-preview,
  gemini-3-pro-image-preview
- Implement GeminiImageGenerator adapter and ImageGeneratorFactory
- Parameterize ImagePromptAgent to support both OpenAI and Gemini providers
- Add Google API key verification via Gemini models endpoint
- Add PhotoBuilder LLM settings UI (Option A: reuse / Option B: dedicated)
  with provider selection, key input, verification, and one-click reuse
- Display active provider and model names on PhotoBuilder page
- Add docs/llm-usage-book.md documenting all LLM concerns and configuration

Co-authored-by: Cursor <cursoragent@cursor.com>
The Stimulus controller searched for the provider radio only within
its own element, missing sibling radios in the same fieldset.  This
caused Google Gemini keys to be verified against OpenAI, always
failing.  Widen the lookup scope to the closest fieldset/form ancestor.

Co-authored-by: Cursor <cursoragent@cursor.com>
…nly)

Lo-res mode (1K, default) enables faster iteration; hi-res mode (2K)
produces higher quality output. The toggle is only shown when the
effective PhotoBuilder provider is Google Gemini, since OpenAI always
generates 1024x1024. Switching modes re-generates all images client-side
using current prompts at the new resolution without a page reload.

Co-authored-by: Cursor <cursoragent@cursor.com>
Remove fixed container_name to allow scaling, add deploy.replicas: 5.

Co-authored-by: Cursor <cursoragent@cursor.com>
After uploading images to S3 via the "Embed into page" action, poll
the remote asset manifests until all uploaded filenames are confirmed
available before redirecting. This prevents the content editor from
referencing images that haven't propagated to the CDN yet.

- Add findAvailableFileNames() to RemoteContentAssetsFacade (basename
  matching against merged manifests) so the logic stays in the
  RemoteContentAssets vertical
- Add thin POST endpoint in PhotoBuilderController that delegates to
  the facade and returns { available, allAvailable }
- Frontend polls every 3s for up to 90s, showing a spinner overlay
- Includes PHP unit tests, frontend tests, and EN/DE translations

Co-authored-by: Cursor <cursoragent@cursor.com>
Use the faster and cheaper Flash model for generating image prompts
in PhotoBuilder when the Google provider is selected. Pro remains
the main text model for content editing.

Co-authored-by: Cursor <cursoragent@cursor.com>
NeuronAI's Gemini provider only checks parts[0] for functionCall,
but Gemini 3 models now return text/thought parts before functionCall
parts, causing all tool calls to be silently missed (0 prompts).

Introduce PatchedGemini provider that scans all parts and reindexes
the tools array after filtering out non-functionCall parts.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Add in-progress and success feedback near single-image Upload CTA
  (spinner + 'Uploading…', then checkmark + 'Uploaded'; dispatch
  uploadComplete/uploadFailed to card)
- Fix upload/success spans always visible: use wrapper spans so only
  'hidden' is toggled (no inline-flex vs hidden conflict)
- Find card from event target for reliable completion/failure delivery
- Add translations: uploading_to_media_store, uploaded_to_media_store
- Stack Regenerate and Upload buttons vertically (flex-col) to fit space

Co-authored-by: Cursor <cursoragent@cursor.com>
…ons textarea

- Track lastAppliedUserPrompt; only apply server userPrompt from poll when
  current value matches it (or first load) so local edits are not overwritten
- Add unit test: user edit preserved when poll runs with textarea unfocused

Co-authored-by: Cursor <cursoragent@cursor.com>
The change-detection optimization skipped dispatching stateChanged to
children when per-image data was unchanged, but children rely on that
event to re-read the parent's data-photo-builder-generating attribute
and enable/disable their Regenerate and Upload buttons. Now tracks
anyGenerating transitions and force-dispatches to all cards when it
changes.

Co-authored-by: Cursor <cursoragent@cursor.com>
…uster

The cache-buster is only applied when img.src is actually set (after a
regeneration cycle clears lastSetImageUrl), so repeated polls with the
same URL still skip re-assignment and avoid redundant fetches.

Co-authored-by: Cursor <cursoragent@cursor.com>
…poll

The promptAwaitingRegenerate logic only accepted new prompts when status
was "pending" or "generating". If image generation completed within one
poll cycle, status was already "completed" and the condition never
matched, leaving the textarea permanently stuck.

Now compares the incoming prompt against the saved pre-regeneration
prompt instead of checking status, correctly handling both fast
completions and stale old-data polls.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Redesign Preview Pages and PhotoBuilder CTA as distinct styleguide cards
- Add translatable Edit HTML / Preview labels with proper icons
- Make filenames clickable links to preview URLs
- Move AI model info into Behind the Scenes section
- Translate embed prefill message for German locale
- Right-align Content Editor action buttons, use styleguide classes
- Use etfswui-card-back-link for PhotoBuilder back link
- Remove unused flex-row sidebar layout so content uses full width

Co-authored-by: Cursor <cursoragent@cursor.com>
@manuelkiessling manuelkiessling marked this pull request as ready for review February 11, 2026 18:41
@manuelkiessling manuelkiessling merged commit a80f17b into main Feb 11, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: PhotoBuilder

1 participant